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Abstract 


Testing  large-scale  dynamic  network  simulation  packages  such  as 
Net  Watch  [34]  requires  a  large  quantity  of  test  data  to  be  available  for  each 
of  the  experiments.  The  test  data  includes  initial  topologies  of  agents’  social 
networks  and  specification  of  knowledge  networks  for  each  of  the  agents  to 
fit  an  empirically  derived  distribution  of  knowledge. 

Testing  the  software  on  machine-generated  data,  as  opposed  to  empirical 
data  only,  allows  the  user  to  conduct  repeatable  tests  that  stress  certain 
aspects  of  the  software  and  help  in  debugging  and  optimization  of  software 
performance. 


Keywords:  Social  Networks,  social  simulation,  scale-free  networks,  cel¬ 
lular  networks,  random  graphs,  software  testing 


1  Introduction 


Testing  large-scale  dynamic  network  simulation  packages  such  as  NetWatch[34] 
requires  a  large  quantity  of  test  data  to  be  available  for  each  of  the  exper¬ 
iments.  The  test  data  includes  initial  topologies  of  agents’  social  networks 
and  specification  of  knowledge  networks  for  each  of  the  agents  to  fit  an  empir¬ 
ically  derived  distribution  of  knowledge.  Another  task  is  creation  of  realistic 
task  structures  that  could  be  used  to  simulate  performance  of  complex  inter¬ 
dependent  projects  by  groups  of  agents. 

The  main  concern  in  generation  of  artificial  data  is  its  realism.  Based 
on  open-source  empirical  data  (such  as  described  in  sec.  4),  the  artificial 
datasets  need  to  approximate  certain  qualities  or  parameters  found  in  the 
empirical  data.  However,  it  is  unclear  at  the  outset  what  parameters  need 
to  be  emulated  to  achieve  highest  fidelity  simulation. 

Frequently,  theories  of  network  topologies  in  a  particular  setting  are  pro¬ 
posed.  For  example,  large  amount  of  social  network  research  relies  on  as¬ 
sumptions  made  by  Erdos  [15]  regarding  topology  and  distances  in  ran¬ 
dom  graphs.  As  an  elaboration  of  Erdos  networks,  small-world  network 
topologies[37]  retain  many  properties  of  random  graphs,  yet  providing  a 
degree  of  structural  realism  that  maps  to  macro-level  structures  of  social 
networks  and  communities  [27]  . 

However,  it  is  now  clear  that  purely  random  graphs  are  not  a  good  approx¬ 
imation  of  topology  of  social  networks.  Other  proposed  topologies  include 
scale- free  networks [3],  whose  role  in  modeling  social  networks  we  discuss  in 
section  2. 

While  none  of  these  theories  has  emerged  as  a  clear  winner  and  new  ideas 
of  network  topologies  in  large-scale  social  networks  are  frequently  published, 
it  is  important  to  make  simulation  tools  independent  of  the  models  and  theo¬ 
ries  of  initial  network  topology.  Furthermore,  a  simulation  tool  that  is  proven 
and  validated  through  docking  and  comparison  with  empirical  results  can  be 
used  as  a  means  to  test  validity  of  multiple  theories  of  network  topology  -  or 
test  its  own  assumptions  against  all  possible  networks. 

Testing  the  software  on  machine-generated  data,  as  opposed  to  empirical 
data  only,  allows  the  user  to  conduct  repeatable  tests  that  stress  certain 
aspects  of  the  software  and  help  in  debugging  and  optimization  of  software 
performance. 

As  number  and  complexity  of  social  network  analysis  algorithms  grows, 
it  becomes  more  and  more  important  to  test  these  algorithms  for  accuracy, 
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scalability,  robustness.  We  define  robustness  of  a  measurement  algorithm  as 
a  function  of  degradation  of  quality  of  measurements  with  decay  of  the  data 
modelled  as  introduction  of  noise  into  inputs  of  the  algorithms. 

Robustness  studies  such  as  [6]  and  [11]  measure  impact  of  decay  in  ran¬ 
dom  networks  on  accuracy  of  computation  of  standard  social  network  analysis 
metrics.  Such  rigorous  tests  require  large  amounts  of  data  which  can  be  easily 
manipulated  to  introduce  errors.  Networks  used  as  input  to  the  robustness 
study  need  to  span  different  sizes  and  topologies,  and  be  easily  manipulated 
to  introduce  a  quantifiable  amount  of  noise  for  robustness  testing.  This  prob¬ 
lem  is  much  easier  to  solve  using  synthetic  (generated)  data,  where  size  and 
topology  of  the  network  are  controlled  by  generation  functions[9].  SNA  algo¬ 
rithms  need  to  be  then  tested  against  multiple  network  topologies.  Moreover, 
parameters  of  the  network  generator  can  be  manipulated  in  a  scientific  fash¬ 
ion,  thus  allowing  the  measurement  algorithm  to  be  also  tested  on  possible 
variation  of  the  topology. 
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2  Terrorist  Organizations  and  Scale- Free  Net¬ 
works 

An  argument  has  been  made  [30]  that  terrorist  networks  may  exhibit  fea¬ 
tures  of  scale-free  networks  and  can  thus  be  treated  as  such  in  analysis  and 
derivation  of  attack  scenarios. 

Scale- free  networks  have  been  observed  in  many  contexts  ranging  from 
networks  of  airline  traffic  to  sexual  networks  and  Web  link  patterns.  The 
high  probability  of  emergence  of  scale-free  networks,  as  opposed  to  evenly 
distributed  random  networks,  is  due  to  a  number  of  factors,  including: 

•  Rapid  growth  confers  preference  to  early  entrants.  The  longer  a  node 
has  been  in  place  the  greater  the  number  of  links  to  it.  First  mover 
advantage  is  very  important. 

•  In  an  environment  of  too  much  information  people  link  to  nodes  that  are 
easier  to  ford  -  thus  nodes  that  are  highly  connected.  Thus  preferential 
linking  is  self-reinforcing. 

•  The  greater  the  capacity  of  the  hub  (bandwidth,  work  ethic,  etc.)  the 
faster  its  growth. 

It  has  also  been  observed  that  scale-free  networks  are  extremely  tolerant  of 
random  failures.  In  a  random  network,  a  small  number  of  random  failures  can 
collapse  the  network.  A  scale-free  network  can  absorb  random  failures  up  to 
80%  of  its  nodes  before  it  collapses.  The  reason  for  this  is  the  inhomogeneity 
of  the  nodes  on  the  network  -  failures  are  much  more  likely  to  occur  on 
relatively  small  nodes. 

However,  scale-free  networks  are  extremely  vulnerable  to  intentional  at¬ 
tacks  on  their  hubs.  Attacks  that  simultaneously  eliminate  as  few  as  5-15%  of 
a  scale-free  network’s  hubs  can  collapse  the  network.  Simultaneity  of  an  at¬ 
tack  on  hubs  is  important.  Scale- free  networks  can  heal  themselves  rapidly  if 
an  insufficient  number  of  hubs  necessary  for  a  systemic  collapse  are  removed. 

Scale-free  networks  are  also  very  vulnerable  to  epidemics.  In  random 
networks,  epidemics  need  to  surpass  a  critical  threshold  (a  number  of  nodes 
infected)  before  it  propagates  system-wide.  Below  the  threshold,  the  epi¬ 
demic  dies  out.  Above  the  threshold,  the  epidemic  spreads  exponentially. 
Recent  evidence [28]  indicates  that  the  threshold  for  epidemics  on  scale- free 
networks  is  zero. 
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However,  the  reality  of  terrorist  networks  does  not  fit  neatly  into  the 
scale-free  network  model.  It  has  been  observed[31]  that  non-state  terrorist 
networks  are  not  only  scale-free  but  also  exhibit  small  world  properties.  This 
means  that  while  large  hubs  still  dominate  the  network,  the  presence  of 
tight  clusters  (cells)  continue  to  provide  local  connectivity  when  the  hubs  are 
removed. 

For  example,  attack  on  A1  Qaeda’s  Afghanistan  training  camps  did  not 
collapse  its  network  in  any  meaningful  way.  Rather,  it  atomized  the  network 
into  anonymous  clusters  of  connectivity  until  the  hubs  could  reassert  their 
priority  again.  Many  of  these  clusters  will  still  be  able  to  conduct  attacks 
even  without  the  global  connectivity  provided  by  the  hubs. 

Furthermore,  critical  terrorist  social  network  hubs  cannot  be  identified 
based  on  the  number  of  links  alone.  For  example,  Krebs  observed[25]  that 
strong  face-to-face  social  history  is  extremely  important  for  trust  develop¬ 
ment  in  covert  networks.  Of  similar  importance  is  the  relevance  of  skills 
and  training  of  agents  inside  a  cell  to  the  task  at  hand.  Thus,  importance 
of  any  individual  within  the  network  should  be  rated  on  a  vector  of  factors 
pertaining  to  its  qualities  as  an  individual  as  well  as  types  and  qualities  of 
its  links. 

Rothenberg[31]  notes  that  postulating  a  path  of  a  set  length  from  every¬ 
one  in  the  global  network  to  everyone  else  (i.e.  scale-free  nature  of  a  terrorist 
network)  runs  contrary  to  the  instructions  for  communication  infrastructure 
set  forth  in  the  A1  Qaeda  training  manual [1].  Thus,  if  a  terrorist  network 
was  observed  to  be  scale-free,  it  can  be  argued  that  its  scale-free  nature  is 
not  a  matter  of  design  and  can  possibly  be  an  artifact  of  the  data  collection 
routines.  For  example,  snowball  sampling[19]  is  biased  toward  highly  con¬ 
nected  nodes,  so  extensive  use  of  this  technique  may  result  in  observation  of 
scale- free  core-periphery  structures  where  none  exist  [5]. 


3  Developing  the  Formalism  of  a  Cellular  Net¬ 
work 

Given  the  case  studies  of  A1  Qaeda  and  other  terrorist  networks,  it  is  clear 
that  terrorist  organizations  cannot  be  adequately  described  as  random  graphs 
or  as  scale-free  networks.  Therefore,  a  different  model  of  terrorist  networks 
has  emerged,  namely  cellular  networks  [31]  [10]  [12],  While  this  model  may 
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not  fit  a  simple  mathematical  definition  such  as  scale-free  or  small-world 
network,  its  base  is  in  empirical  and  held  data[18].  In  section  5.3,  I  will  show 
that  cellular  networks  in  fact  are  not  characterized  by  a  lack  of  a  formal 
representation  but  are  defined  through  a  more  complex  process  which  takes 
as  a  goal  improvement  of  fit  between  the  model  network  and  empirical  data. 

Cellular  networks [7]  are  different  from  traditional  organizational  forms  as 
they  replace  a  hierarchical  structure  and  chain  of  command  with  sets  of  quasi- 
independent  cells,  distributed  command,  and  rapid  ability  to  build  larger  cells 
from  sub-cells  as  the  task  or  situation  demands.  In  these  networks,  the  cells 
are  often  small  and  are  only  marginally  connected  to  each  other.  The  cells 
are  distributed  geographically,  and  may  take  on  tasks  independently  of  any 
central  authority [8]. 

Rothenberg[31]  observed  a  number  of  properties  of  a  cellular  network: 

•  The  entire  network  is  a  connected  component. 

...It  is  likely  that  on  the  local  level,  individual  ties  are  very 
strong... On  the  higher  level,  individual  ties  are  likely  to  be 
weaker  but  the  strength  of  association  [people  known  in  com¬ 
mon,  doctrine]  is  likely  to  remain  high... 

•  The  network  is  redundant  on  every  level:  Each  person  can  reach  other 
people  by  multiple  routes  -  which  can  be  used  for  both  transmission 
of  information  as  well  as  material.  On  the  local  level,  there  will  be  a 
considerable  structural  equivalence [35],  which  will  ameliorate  the  loss 
of  an  individual.  The  redundancy  in  communication  channels  may  also 
be  mirrored  in  the  redundance  of  groups  engaged  in  a  particular  task. 

•  On  the  local  level,  the  network  is  small  and  dynamic,  consisting  of 
small  cells  (4-6  people)  that  operate  with  relative  independence  and 
little  oversight  on  the  operational  level. 

•  The  network  is  not  managed  in  a  top-down  fashion.  Instead,  its  com¬ 
mand  structure  depends  on  vague  directives  and  religious  decrees,  while 
leaving  local  leaders  the  latitude  to  make  operational  decisions  on  their 
own. 

•  The  organizational  structure  of  a  terrorist  network  was  not  planned, 
but  emerged  from  the  local  constraints  that  mandated  maintenance  of 
secrecy  balanced  with  operational  efficiency. 
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Each  cell  is,  at  least  in  part,  functionally  self-sufficient  and  is  capable  of 
executing  a  task  independently.  Cells  are  loosely  interconnected  with  each 
other  for  purposes  of  exchanging  information  and  resources.  However,  the 
information  is  usually  distributed  on  a  need-to-know  basis  and  new  cell  mem¬ 
bers  rarely  have  the  same  exact  skills  as  current  members.  This  essentially 
makes  each  individual  cell  expendable.  The  removal  of  a  cell  generally  does 
not  inflict  permanent  damage  on  the  overall  organization  or  convey  signifi¬ 
cant  information  about  other  cells.  Essentially,  the  cellular  network  appears 
to  morph  and  evolve  fluidly  in  response  to  anti-terrorist  activity [32]. 

This  leads  to  a  hypothesis  that  cells  throughout  the  network  contain  struc¬ 
turally  equivalent  [17]  and  essential  roles,  such  as  ideological  or  charismatic 
leaders,  strategic  leaders,  resource  concentrators  and  specialized  experts. 

Given  this  hypothesis,  one  can  further  reason  that  operations  of  a  partic¬ 
ular  cell  will  be  affected  in  a  negative  way  by  the  removal  of  an  individual 
filling  one  of  these  roles.  I  further  posit  that  a  further  development  of  a  cellu¬ 
lar  network  formalism  as  an  empirically  driven  and  yet  mathematically  sound 
concept,  is  necessary  for  creation  of  computational  models  that  combine  face 
validity  towards  real-world  data  as  well  as  veridicality  towards  formal  models 
of  organizational  evolution. 
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4  Open-Source  Data  on  Terrorist  Networks 

Social  network  datasets  were  extremely  difficult  to  obtain  and  limited  in  size 
and  scope,  until  recently.  The  prevailing  methodology  for  collecting  social 
network  data  was  by  survey,  either  administered  to  an  entire  group  of  people 
or  collected  in  a  snowball  fashion.  Collection  of  social  network  data  was  done 
in  a  way  reminiscent  of  anthropological  data  collection  -  by  a  human  observer 
embedded  into  an  organization  to  be  studied. 

This  presented  a  number  of  problems.  First  of  all,  it  was  very  costly 
to  collect  all  but  the  smallest  of  datasets.  While  a  number  of  sampling 
strategies  were  investigated,  it  was  difficult  or  infeasible  to  canvass  a  larger 
organization  or  population.  Furthermore,  presence  of  an  observer  or  a  survey 
instrument  in  an  organization  inevitably  altered  the  behaviour  of  individuals 
in  the  organization. 

Finally,  for  some  networks,  especially  terrorist  networks,  it  is  physically 
impossible  to  collect  a  dataset  via  direct  survey  administration.  The  modus 
operandi  of  such  networks  is  covertness  and  this  necessarily  limits  the  data 
that  can  be  collected  on  them. 

Thus,  for  study  of  terrorist  organizations,  one  must  obtain  information 
via  indirect  means.  One  approach  to  gathering  indirect  social  network  data 
is  via  analysis  of  texts.  Originally  used  as  manual  coding  technique,  text 
analysis  has  now  been  automated  to  extract  network  structure  from  corpora 
of  text  based  on  co-appearance  of  people,  organizations  and  other  entities. 
An  example  of  such  text  coding  is  the  representation  of  the  Hamas  net¬ 
work  (figure  1),  extracted  by  AutoMap  from  a  set  of  documents  describing 
organizational  structure  and  operational  constraints  of  the  Hamas  terrorist 
organization. 

Between  September  14,  2001  and  November,  2001  Valclis  Krebs[25]  assem¬ 
bled  a  corpus  of  texts  regarding  events  preceding  September  11th  attacks. 
Manual  analysis  of  these  texts  yielded  a  dataset  which  became  one  of  the 
definitive  sources  of  data  on  terrorist  organizations  and  structure  of  a  terror¬ 
ist  plot. 

Since  2001,  much  larger  datasets  on  covert  networks  are  available  due  to 
both  increased  interest  in  the  research  and  improvements  in  tools  for  machine 
collection  of  network  data. 

Some  of  the  newer  more  complete  datasets  include  these  collected  by 
IntelCenter[23],  R.  Renfro[29]  and  M.  Sageman[32] 

In  the  aftermath  of  the  September  11th  attacks,  it  was  noted  that  coher- 
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Figure  1:  Data  on  Hamas  collected  by  AutoMap 

ent  information  sources  on  terrorism  and  terrorist  groups  were  not  available 
to  researchers [20].  Information  was  either  available  in  fragmentary  form,  not 
allowing  comparison  studies  across  incidents,  groups  or  tactics,  or  made  avail¬ 
able  in  written  articles  -  which  are  not  readily  suitable  for  quantitative  anal¬ 
ysis  of  terrorist  networks.  Data  collected  by  intelligence  and  law-enforcement 
agencies,  while  potentially  better  organized,  is  largely  not  available  to  the 
research  community  due  to  restrictions  in  distribution  of  sensitive  informa¬ 
tion. 

To  counter  the  information  scarcity,  a  number  of  institutions  developed 
unified  database  services  that  collected  and  made  available  publicly  accessible 
information  on  terrorist  organizations.  This  information  is  largely  collected 
from  open  source  media,  such  as  newspaper  and  magazine  articles,  and  other 
mass  media  sources. 

Such  open-source  databases  include: 

•  RAND  Terrorism  Chronology  Database [14]  -  including  international 
terror  incidents  between  1968  and  1997 

•  RAND-MIPT  (Memorial  Institute  for  Prevention  of  Terrorism)  Terror¬ 
ism  Incident  Database[21],  including  domestic  and  international  terror¬ 
ist  incidents  from  1998  to  the  present 


•  MIPT  Indictment  Database  [33]  -  Terrorist  indictments  in  the  United 
States  since  1978. 

Both  RAND  and  MIPT  databases  rely  on  publicly  available  informa¬ 
tion  from  reputable  information  sources,  such  as  newspapers,  radio  and 
television. 

•  IntelCenter  Database  (ICD)[22]  includes  information  on  terrorist  inci¬ 
dents,  groups  and  individuals  collected  from  public  sources,  including 
not  only  traditional  media  outlets  and  public  information  (such  as  in¬ 
dictments),  but  also  information  learned  from  Middle  East-based  news 
wire  services.  Separately,  IntelCenter  also  collects  information  from 
Arabic  chat-rooms  and  Internet-based  publications  -  although  value  of 
such  data  is  questionable  and  data  may  be  tainted  by  propaganda. 
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Figure  2:  A  Uniform  Random  Network 

5  Generating  Person-to-Person  Networks 

5.1  Erdos  Random  Graphs 

The  study  of  random  graphs  dates  back  to  the  work  of  Erdos  and  Renyi 
whose  seminal  papers [15]  [16]  laid  the  foundation  for  the  theory  of  random 
graphs. 

There  are  three  standard  models  for  Erdos  random  graphs [2],  Each  has 
two  parameters.  One  parameter  controls  the  number  of  nodes  in  the  graph 
and  one  controls  the  density,  or  number  of  edges. 

For  example,  the  random  graph  model  G(n,  e)  assigns  uniform  probability 
to  all  graphs  with  n  nodes  and  e  edges  while  in  the  random  graph  model 
G(n,p )  each  edge  is  chosen  with  probability  p. 

5.2  Scale-Free  Networks 

One  of  the  most  interesting  features  of  a  large  class  of  the  complex  networks 
under  study  now  is  their  scale-free  behavior:  each  node  of  the  network  is 
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(a) 


(b) 


(c)  (d) 

Figure  3:  Distribution  of  centralities  in  a  Erdos  random  network:  (a)Degree, 
(b)Closeness,  (c) Betweenness,  and  (d) Eigenvector 

connected  to  some  other  k  nodes.  The  number  of  connections  obeys  a  power- 
law  distribution,  i.e.  P(k)  ~  k1 , 2  <  7  <  3  for  most  networks  considered. 

Snch  networks  are  dubbed  ”  scale- free”  because  the  fluctuations  of  the 
distribution  around  the  average  value  k  are  infinite  (they  do  not  possess 
any  particular  scale).  The  difference  between  a  scale- free  network  and  a 
random  network  (where  every  link  between  different  nodes  is  present  with  a 
probability  p,  resulting  in  a  Poisson  degree  distribution)  hints  towards  some 
mechanisms  that  generated  the  observed  network  features.  One  of  the  most 
celebrated  models  that  explains  the  emergence  of  scale-free  networks  is  the 
Barabasi-  Albert  (BA)  model  [4], 

According  to  the  BA  model,  the  two  essential  ingredients  for  the  forma¬ 
tion  of  scale-free  networks  are  growth  and  preferential  attachment.  Growth 
implies  that  new  nodes  are  added  to  the  network  over  time  at  a  more  or 
less  constant  rate.  Preferential  attachment  means  that  a  newly  added  node 
connects  preferentially  to  nodes  that  already  have  a  high  degree:  a  new  node 
tries  to  attach  to  authoritative  nodes  and  the  degree  of  a  node  is  an  effective 
representation  of  its  authoritativeness.  It  has  been  shown  that,  if  the  proba¬ 
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Figure  4:  A  Scale-Free  Network  generated  by  preferential  attachment 


bility  to  connect  to  a  site  is  linearly  proportional  to  its  degree,  then  growth 
and  preferential  attachment  indeed  generate  scale- free  networks [24], 

5.3  Cellular  Networks 

The  above-mentioned  algorithms  for  generating  simulated  organizational  data 
can  be  summarized  as  creating  an  approximation  of  real  social  phenomena 
(i.e.,  organizational  structure)  by  means  of  an  analytically  solvable  function 
or  a  statistical  mechanism. 

Below  we  present  an  alternative  approach,  which  relies  on  the  observa¬ 
tions  of  organizational  structure  of  extant  covert  networks  via  creation  of  a 
network  profile. 

We  define  a  generative  network  profile  as  a  collection  of  observations 
and  measurements  that,  when  taken  together,  can  be  used  as  a  generative 
function  for  creating  networks  similar  to  ones  observed  in  the  real  world. 

The  method  of  generating  simulated  organizational  structures  from  pro¬ 
files  should  be  generalizable  to  many  different  types  of  organizations.  How- 
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(a) 


(b) 


(c)  (d) 

Figure  5:  Distribution  of  centralities  in  a  scale-free  network:  (a)Degree, 
(b)Closeness,  (c) Betweenness,  and  (d) Eigenvector 

ever,  for  every  type  of  organization  the  components  of  a  generative  profile 
would  be  different. 

In  this  section  we  present  a  generative  prohle  of  a  cellular  covert  network 
based  on  the  publicly  available  dataset  on  September  11th  hijackers [25]. 

Based  on  publicly  available  data  collected  by  Krebs[25],  the  following 
prohle  of  the  structure  of  covert  networks  has  been  derived  [12]: 

•  The  network  consists  of  small  cells  (mean  cell  size  of  6  members)  with 
very  low  interconnection  between  cells. 

•  Internally,  the  cells  exhibit  dense  communication  patterns. 

•  There  is  a  very  low  probability  of  two  individuals  communicating  by 
chance  (0.007). 

•  The  probability  of  triad  closure  (link  from  x  to  y  being  more  likely  if 
both  x  and  y  are  linked  to  third  party  z)  is  0.181. 
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Figure  6:  Red  Team:  A  Cellular  Covert  Network 

•  Senior  members  of  each  of  the  cells  are  often  also  parts  of  other  cells 
and  interact  with  other  senior  members  on  the  network. 

•  Cell  leaders  are  more  knowledgeable  than  other  members. 

•  Cell  members  share  an  ideological  doctrine  but  also  specialized  knowl¬ 
edge  (i.e.  bombmakers,  drivers,  operatives). 

•  Cells  use  information  technologies  and  electronic  communication. 

The  aforementioned  parameters  form  a  statistical  profile  from  which  we 
can  generate  simulated  organizational  networks.  The  plot  on  figure  6  shows 
a  covert  network  generated  using  parameters  specified  above. 

The  algorithm  for  generating  a  network  based  on  the  above  profile  is 
represented  in  listing  1 

6  Generalization  and  Optimization  of  Net¬ 
work  Profiles 

At  this  point,  the  choice  of  profile  components  lies  in  the  hands  of  the  re¬ 
searcher  and  creation  of  a  profile  is  a  manual  task.  However,  creation  of  such 
profiles  can  be  represented  as  an  optimization  problem. 

Creation  of  general-purpose  generative  profiles  can  be  done  with  using 
the  following  assumptions: 

•  Let  the  network  consist  of  a  finite  number  of  layered  groupings.  For  ex¬ 
ample,  a  corporate  network  may  be  viewed  as  a  collection  of  (a)people, 
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Listing  1:  ’’Generating  Cellular  Networks” 

— 

//Generate  Cells 
CREATE  cells  with 

c e  1 1  _s i z e  ( )=normally  distributed  random  variable 
(mean=average  cell  size,  std.dev  =  0.17*mean); 

//Assign  agents  to  cells 
FOR  all  agents  DO 
current_cell=random  cell 
IF  current_cell  is  not  full  THEN 
assign  an  agent  tc)  current_cell 
ELSE  pick  a  new  cell  ;  repeat  this  operation. 

END  IF 
END  FOR 


//Fill  in  connections  inside  cells 
FOR  all  cells  DO 

PICK  a  random  agent  inside  the  cell  t<3  serve  as  a  leader 

//Internally  ,  generate  a  uniform  network 
FOR  all  agents  inside  the  cell  DO 
generate  links  within  cell  with  the  given  density 

END  FOR 


//Bring  the  probability  of  triad  closure  in  line  with  the 
measurements 

IF  probability  of  triad  closure  significantly  less  then 
measured  value 

Add  a  small  random  number  of  edges ;  repeat  the  measurements 

ELSE 

Drop  a  small  random  number  of  edges ;  repeat  the  measurements 

END  IF 
END  FOR 


FOR  all  cell  leaders  picked  in  previous  step 
Generate  links  among  cell  leaders  t<3  produce  required  inter- 
cell  density 

END  FOR 
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(C)  (d) 

Figure  7:  Distribution  of  centralities  in  a  cellular  network:  (a)Degree, 

(b) Closeness,  (c) Betweenness,  and  (d) Eigenvector 

(b)  workgroups,  (c) departments,  (d) divisions,  and  (e)an  entire  corpo¬ 
ration  -  resulting  in  a  5  levels  of  groupings. 

•  Assume  that  groupings  at  each  of  the  levels  (e.g.  departments)  connect 
to  each  other  with  a  network  structure  that  can  be  expressed  with  a 
generative  function  (unform,  scale-free,  etc). 

A  generalized  algorithm  for  generation  of  complex  organization  network 
can  be  described  as  a  traversal  of  the  hierarchy  of  layered  groupings  from 
most  specific  to  most  general  while  applying  a  generative  function  for  each 
of  the  layers  to  generate  edges  at  the  given  layer. 

Thus,  generation  of  a  complex  network  can  be  parameterized  with  a  pro¬ 
file  consisting  of  (a)number  of  layers  ,  (b)size  of  groupings  at  each  layer,  and 

(c) a  simple  generative  function  for  each  layer. 

Given  that  number  of  simple  generative  functions  is  finite  such  parametriza- 
tion  can  be  then  viewed  as  an  optimization  problem,  defined  as  traversal  of 
a  state-space  of  generative  profiles  and  evaluating  the  fit  of  each  generative 
profile  to  a  population  of  known  networks. 


16 


L 

I 


Task-related 
Specialist  knowledge 


J 

I 


|~|  =  High  Knowledge  |  |  =  Low  Knowledge  |  |  =  No  Knowledge 


Figure  8:  Knowledge  Distribution  of  NetWatch  Agents 

7  Generating  Knowledge  Networks 

Knowledge  is  represented  in  the  MetaMatrix  as  a  set  of  nodes,  with  each 
representing  facts  or  groups  of  facts.  Knowledge  that  an  agent  possesses  is 
referred  to  as  an  edge  between  an  Agent  node  and  a  Knowledge ;  knowledge 
that  is  required  to  accomplish  an  primitive  task  is  represented  as  an  edge 
between  Task  and  Knowledge  nodes;  etc. 

Based  on  data  available  on  structure  of  terrorist  training[12],  NetWatch 
generates  agent-knowledge  networks  using  a  profile  of  the  knowledge  network 
of  a  cellular  organization. 

The  knowledge  that  the  agents  possess  is  divided  into  a  three  main  cat¬ 
egories.  These  categories  encompass  (a)  general  doctrine  and  ideology  of 
the  organization,  (b)  shared  training  and  skills  in  MO  of  the  organization 
(e.g.  communication  procedures,  clandestine  operations),  (c)  specialist  task- 
related  skills  (e.g.  bomb-making,  sniper  skills,  getaway  car  driving),  and  (c) 
knowledge  of  overall  organizational  structure. 

The  algorithm  for  generating  the  knowledge  network  presumes  the  exis¬ 
tence  of  well-formed  cells,  as  generated  by  the  algorithm  in  section  5.3.  The 
following  principles  are  followed: 

•  Cell  leaders  are  more  knowledgeable  than  other  members.  As  cell  lead¬ 
ers  are  recruited  from  the  ranks  of  experienced  operatives,  their  doctri¬ 
nal  knowledge  is  high  and  they  possess  many  of  the  shared  skills  of  the 
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other  agents.  They  also  possess  a  small  amount  of  knowledge  in  each 
of  the  specialist  areas.  This  knowledge  is  not  sufficient  to  replace  spe¬ 
cialist  agents  but  is  sufficient  to  proficiently  delegate  subtasks  during 
execution  of  a  complex  operation. 

•  Cell  members  share  an  ideological  doctrine  and  a  modus  operandi,  fur¬ 
ther  referred  to  as  ’’shared  knowledge”.  Adherence  to  a  militant  ideol¬ 
ogy  is  a  driving  factor  in  recruiting  of  operatives  in  terrorist  organiza¬ 
tions  and  is  further  amplified  during  training  of  studies  in  an  a  militant 
religious  academy. 

Shared  M.O.  skills  are  derived  from  shared  training  camp  experiences 
that  terrorist  organization  recruits  undergo.  Shared  skills  include  com¬ 
munication  procedures,  clandestine  operation  skills,  preservation  of  se¬ 
crecy  during  planning  and  preparation  of  operations. 

•  Cell  members  possess  specialized  knowledge  that  outlines  their  specific 
function  within  a  cell;  these  facts  are  further  referred  to  as  ’’specialist 
knowledge”. 

•  A  specialized  portion  of  the  knowledge  network  deals  with  overall  knowl¬ 
edge  of  the  organizational  structure  and  policies.  This  knowledge  is 
privileged  information  distributed  only  to  cell  leaders  and  is  further 
referred  to  as  privileged  knowledge.  However,  rank-and-file  cell  mem¬ 
bers  may  obtain  small  amounts  of  the  privileged  information  through 
interaction  with  other  agents  outside  the  primary  cell. 

The  algorithm  that  generates  knowledge  networks  as  outlined  above  is 
fairly  simple.  The  knowledge  network  is  divided  into  portions  based  on  pur¬ 
pose  of  each  faet(e.g.  shared  knowledge,  specialist  knowledge,  privileged 
knowledge)  (see  figure  8). 

Then,  for  each  agent  a*  and  fact  fk  the  algorithm  generates  a  probability 
P%k  of  existence  of  a  an  edge  at  —  fk  based  on  the  group  that  the  agent  belongs 
to  (i.e.  cell  leader  vs.  rank-and-file)  and  what  group  the  fact  belongs  to  (i.e. 
shared,  specialist  or  privileged). 

The  edges  are  then  instantiated  with  a  roll  of  the  dice. 

7.0.1  Algorithm  Parameters 

The  knowledge  network  generator  depends  on  the  following  parameters: 
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Figure  9:  Construction  of  a  Task  Network  as  a  Precedence  Graph 


•  Proportion  of  shared  knowledge 

•  Proportion  of  specialist  knowledge 

•  Proportion  of  privileged  knowledge 

8  Generating  Task  Structures 

The  task  network  consists  of  a  set  of  primitive  and  compound  tasks  with 
their  precedence  relations  expressed  as  Task  —  Task  edges  in  the  MetaMa- 
trix.  The  complexity  of  the  task  network  in  terms  of  feasibility  of  execution 
can  be  controlled  by  varying  the  average  connectivity  (sum  of  predecessors 
and  successors)  of  a  task[13].  This  parameter  can  be  essentially  thought  as 
controlling  the  parallelism  within  the  task  network. 

If  the  people-to-people  network  was  generated  as  a  cellular  network,  as¬ 
signments  of  people  to  subtasks  ( Person  —  Task  edges)  are  uniformly  dis¬ 
tributed  within  each  cell.  This  results  in  various  degrees  of  subtask  difficulty 
(amount  of  resource  seeking  and  delegation  required  to  accomplish  the  task). 
When  people-to-people  networks  are  created  as  random  or  scale-free  graphs, 
the  task  assignments  are  distributed  uniformly  throughout  the  entire  network 
which  results  in  some  tasks  being  not  feasible. 

9  Scalability 

To  estimate  efficiency  of  the  network  generation  algorithms,  we  have  con¬ 
ducted  timing  runs  of  each  of  the  algorithms  for  generation  of  people-to- 
people  networks:  Erdos  random  graphs,  scale-free  networks  with  preferential 
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Number  of  Nodes  *  100 


Figure  10:  Time  requirements  to  generate  networks 


attachment,  and  cellular  networks.  We  varied  the  size  of  the  network  to  be 
generated  from  100  to  3500  nodes. 

Figure  10  shows  the  time  in  seconds  to  generate  a  network  of  a  given 
size  with  each  of  the  algorithms.  The  least  efficient  of  the  algorithms  is 
the  preferential  attachment  algorithm,  which  grows  exponentially.  Use  of 
this  algorithm  becomes  impractical  for  networks  over  2000  agents,  where 
generation  of  the  graph  took  approximately  10000  seconds,  or  a  little  under 
3  hours.  While  the  computational  complexity  of  this  algorithm  is  very  high, 
it  can  be  executed  by  a  parallel  machine  in  near  linear  time  [26] . 

Erdos  random  graphs  have  been  shown [15]  to  have  a  quadratic  complexity 
(@(n2)).  However,  one  iteration  of  edge  generation  is  a  very  fast  operation, 
so  the  algorithm  remains  practical  in  generating  networks  of  up  to  20000 
nodes  (generation  time  is  120  seconds). 

The  cellular  network  generation  algorithm  performs  in  near-linear  time 
due  to  the  fact  that  cells  are  small  and  self-contained.  The  computational 
complexity  of  the  cellular  network  generator  is  Q{(Tceii^k2  +  umterce»n)  = 
Q(aceiink  +  inter ceiin)  where  n  is  the  number  of  nodes,  k  is  the  mean  size 
of  a  cell,  and  <rceu  and  (JmterCeii  are,  respectively,  densities  inside  the  cell  and 
between  cells.  Thus,  when  k  is  much  smaller  then  n,  the  complexity  of  the 
cellular  network  generator  is  close  to  @(ro).  In  practical  terms,  this  means 
that  even  very  large  networks  can  be  generated  in  relatively  short  times,  with 
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a  20,000  node  network  taking  less  then  20  seconds  to  generate. 

10  Conclusion 

All  of  the  network  generation  algorithms  described  above  are  used  as  a  means 
of  testing  Net  Watch,  a  large-scale  multi-agent  simulation  of  covert  networks. 

While  realism  of  data  generated  by  any  of  these  algorithms  can  be  dis¬ 
puted  and  nothing  is  more  realistic  then  empirical  data,  the  use  of  diverse 
techniques  for  generating  initial  data  allows  the  simulation  researcher  to  test 
the  multi-agent  system  on  networks  of  widely  varying  sizes  and  topologies. 
Due  to  small  quantities  of  available  empirical  data,  this  is  currently  not  pos¬ 
sible  to  do  without  resorting  to  artificially  generated  data. 

This  report  is  not  comprehensive  in  regards  to  generation  of  all  possible 
network  topologies.  In  this  work,  we  did  not  consider  small-world  networks, 
as  generation  of  small- world  topologies  is  addressed  well  in  [27]  and  [37]. 
Further,  we  did  not  consider  issues  of  generating  hierarchical  networks. 

In  the  held  of  modeling  social  and  organizational  networks,  it  is  important 
to  address  organizations  as  comprehensive  network  structures,  incorporating 
structures  of  task  interdependency,  information  and  resource  requirements 
as  well  as  person-to-person  structures.  This  comprehensive  approach  would 
allow  modeling  organizations  based  on  their  form,  e.g.  departmental,  func¬ 
tional,  or  matrix  organizations. 

While  the  generalized  generative  approach  described  in  section  6  allows 
for  wide  flexibility  in  the  topology  of  generated  networks,  it  is  not  designed 
for  modeling  organically  emerging  network  forms,  such  as  these  of  markets. 
For  example,  market-driven  network  may  exhibit  emergent  segmentation 
processes [36],  which,  due  to  the  complexity  of  the  market  process,  can  be 
only  generated  via  simulation  of  the  market  environment. 

As  a  software  engineering  tool,  the  network  generation  package  provides 
a  consistent  interface  to  all  of  its  generation  functions  -  therefore  enabling 
the  user  (e.g.  NetWatch)  to  test  performance  of  the  simulation  tools  on  a 
wide  variety  of  source  networks.  This  also  forces  the  simulation  to  remain 
independent  of  the  initial  network  topology  and  thus  allow  for  multi-theory 
testing  of  simulation  tools. 
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